Introduction

The following datasets are used in this demonstration:

These datasets are also available altogether here in the GitHub repository for this demonstration.

Table of contents

  1. Population and household estimates (univariate)

    1.1 Gender ratio

    Basic pie chart

    1.2 Population and gender ratio by region

    How to use subplots to group charts together, and how to use annotations to add more details to the plot.

    1.3 Population by outward postcode using choropleth map

    How to plot location related data on a map using geojson.

    1.4 Population by postcode parent area using choropleth map

    How to merge granulated geo areas and plot on the map.

  2. House prices and earnings (multivariate)

    2.1 Correlation between median house price and median household earning

    How to use a combination of scatter plot, trendline, box plot and rug plot to show distributions and correlations between variables.

    2.2 Median and lower quartile house prices from 2002 to 2020 (animated)

    How to use animated chart to show trends over changes of one variable.

  3. Text analysis and visualisation

    3.1 Text tokenization and word cloud

    How to perform a simple text tokenization and use word cloud to show word frequencies.

    3.2 Basic sentiment analysis with VADER

    How to perform a simple sentiment analysis and use grouped line charts.


Preparation

Please run this code block before running any other blocks in this notebook.

1. Population and household estimates

1.1 Gender ratio

1.2 Population and gender ratio by region

1.3 Population by outward postcode using choropleth map

Concat individual postcode geojson mapping into single variable geojson_uk.

As the opensource geographical data used here comes from Wikipedia, it does not cover all England regions. This leads to white areas on the map.

1.4 Population by postcode parent area using choropleth map

In the above plot, regions are probably too granulated, so you would not be able to see an obvious trend. Let's merge them into parent regions.

2. House prices and earnings

Define a reusable function to read different sheets from the source data file.

2.1 Correlation between median house price and median household earning

2.2 Median and lower quartile house prices from 2002 to 2020 (animated)

3. Text based analysis and visualisation

3.1 Text tokenization and word cloud

3.2 Basic sentiment analysis with VADER

Use nltk's built-in VADER model to perform a basic sentiment analysis on these tweets without the need for training.